SimpleETL: ETL Processing by Simple Specifications
نویسندگان
چکیده
Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgottenwhere an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handling version management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.
منابع مشابه
UnifiedViews: Towards ETL Tool for Simple yet Powerfull RDF Data Management
We present UnifiedViews, an Extract-Transform-Load (ETL) framework that allows users to define, execute, monitor, debug, schedule, and share ETL data processing tasks, which may employ custom plugins (data processing units, DPUs) created by users. UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. In this paper, we introduce UnifiedViews and discuss f...
متن کاملRequirements Analysis Method For Extracting-Transformation-Loading (Etl) In Data Warehouse Systems
The data warehouse (DW) system design involves several tasks such as defining the DW schemas and the ETL processes specifications, and these have been extensively studied and practiced for many years. The problems in heterogeneous data integration are still far from being resolved due to the complexity of ETL processes and the fundamental problems of data conflicts in information sharing enviro...
متن کاملEntropy Guided Transformation Learning
This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. It generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. We also present ETL Committee, an ensemble method that uses ETL as the base learner. The main advantage of ETL is its easy applicability ...
متن کاملDynamic-ETL: a hybrid approach for health data extraction, transformation and loading
BACKGROUND Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to tran...
متن کاملRameps: a Goal-ontology Approach to Analyse the Requirements for Data Warehouse Systems
The data warehouse (DW) systems design involves several tasks such as defining the DW schemas and the ETL processes specifications, and these have been extensively studied and practiced for many years. However, the problems in heterogeneous data integration are still far from being resolved due to the complexity of ETL processes and the fundamental problems of data conflicts in information shar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018